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Abstract Many physical systems are described by nonlinear differential equations that are 
too complicated to solve in full. A natural way to proceed is to divide the variables into those 
that are of direct interest and those that are not, formulate solvable approximate equations for 
the variables of greater interest, and use data and statistical methods to account for the impact 
of the other variables. In the present paper the problem is considered in a fully discrete-time 
setting, which simplifies both the analysis of the data and the numerical algorithms. The resulting 
time series are identified by a N ARM AX (nonlinear autoregression moving average with exogenous 
input) representation familiar from engineering practice. The connections with the Mori-Zwanzig 
formalism of statistical physics are discussed, as well as an application to the Lorenz 96 system. 


1 Introduction and outline 


There are many problems in science where the equations of motion are too complex for full solution, 
either because the equations are not certain or because the computational cost is too high, but one 
is interested only in the dynamics of a subset of the variables. Such problems appear, for example, 
in weather and climate modeling, see e.g. [1,2], in economics, see e.g. [3], in statistical mechanics, 
see e.g. [4,5], and in the mechanics of turbulent flow, see e.g. [6,7]. In these settings it is natural 
to look for simpler equations that involve only the variables of interest, and then use data to assess 
the effect of the remaining variables on the variables of interest in some approximate way. In 
the present paper we focus on stochastic methods for doing this, with data coming either from 
experimental observations or from prior numerical calculations. 

Consider a set of differential equations of the form: 


= R{x, y), = S{x, y), 


( 1 ) 


where t is the time, x = (xi, X2, .. •, Xm) is the vector of resolved variables, and y = {yi,y2, ■ ■ ■ ,yi) 
is the vector of unresolved variables, with initial data x(0) = a, y(0) = f3. Consider a situation 
where this system is too complicated to solve, but where data are available, usually as sequences 
of measured values of x, assumed here to be observed with negligible observation errors. One can 
write R{x, y) in the form 

R{x,y) = Ro{x) + z{x,y), (2) 

where Rq approximates R{x, y) in some sense and is such that one is able to solve the equation 


-X = floM. 


(3) 


The remainder z{x, y) = R{x, y) — Ro{x), often called the “unresolved tendency”, is the contribution 
of the unresolved variables to the equation for x. Recent work has shown that z can represent model 
error [8,9] or model noise [10]; we call 2 simply the “noise”. 
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A usual approach to the problem of computing x is to identify z from from data [11-13], i.e., 
find a concise approximate representation z of z that can be evaluated on the computer, and then 
solve the equation 

= Ro{x) + z. (4) 

at 

When z is a stochastic process, this is a “stochastic parametrization”. Equation (4) is an instance 
of a dimensionally-reduced equation, in the sense that it has fewer variables than equations (1). 
However, this approach has some difficulties. In general the available data are measurements of 
X, not of z; to find z so that it can be identified one has to use equation (2), and in particular 
differentiate x numerically, which is generally impractical or inaccurate because z may have high- 
frequency components or fail to be sufficiently smooth, and the data may not be available at 
sufficiently small time intervals (an illuminating analysis in a special case can be found in [14,15]). 
If one can successfully identify z, equation (4) generally becomes a nonlinear stochastic differential 
system, where in general z at a given time depends on earlier values of x and z (see the next 
section), that may be hard to solve with sufficient accuracy (see e.g [16,17]). 

Here we take a different approach. We note that equations (3) and (4) are always solved on the 
computer, i.e., in discrete form, that the data are always given at a discrete collection of points, 
and that one wishes to determine x but in general one is not interested in determining z. We can 
therefore avoid the difficult detour through a continuous z followed by a discretization, by working 
entirely in the discrete setting as follows. We can pick once and for all a particular accurate 
discretization of equation (3) with a particular time step 5, 

x'^+^ = x'^ + SRsix'^), 

where Rs is obtained, for example, from by a Runge-Kutta method, and where n indexes the 
result after n steps. As in the continuous case, this equation has to be corrected to account for 
the impact of the continuous variables, and here also for the possible inaccuracy of the numerical 
scheme. We use the data to identify the discrepancy sequence, z^^^ = —x^)/6—Rs{x^), which 

are available from x data without approximation. This is equivalent to identifying the following 
reduced system 

x^+^ =x'^ + 6Rs{x'^) + 6z'^+\ ( 5 ) 

which constitutes a discrete stochastic parametrization. 

We assume, as one does in the continuous case, that the system under consideration is ergodic, 
so that its long-time statistics are stationary. The sequence z^ becomes a stationary time series, 
which we represent by the widely used N ARM AX (nonlinear auto-regression moving average with 
exogenous inputs) representation, with x as an exogenous input. This representation makes it 
possible to integrate the numerical scheme into the reduced equations, and to take into account 
efficiently the non-Markovian features of the reduced system as well as model and numerical errors. 
There is no stochastic differential system to solve. It is important to note that identifying zs can be 
very different from identifying the continuous z. The question, in what sense does zs approximate 
z, is not relevant, since the goal is to calculate x and z^ is sufficient for the purpose. Note that 
zs should be a good approximation of z whenever equation (3) can be effectively approximated 
by the first-order Euler scheme. Eor practical purposes, the discrete stochastic parametrization 
accomplishes everything that would be accomplished by a successful continuous parametrization 
followed by an accurate approximation. Erom now on, we drop the hat that distinguishes between 
a process and its identification and the subscript 5 m. zs- 

The rest of the paper is organized as follows. In the next section we summarize the N ARM AX 
methodology we use, make comparisons with earlier work, and explain the connections with the 
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Mori-Zwanzig formalism of statistical physics. In section 3 we present our test problem, the Lorenz 
96 equations. In section 4 we present numerical results. In the final section we draw conclusions. 


2 The NARMAX representation 

We represent z in the reduced system (5) by a variant of the NARMAX representation, see e.g. 
[18-21]. Its advantages are its generality, and the extensive experience in its use. Most of the earlier 
identifications can be viewed as special case of this representation, as explained below. The greater 
generality of the NARMAX approach will be increasingly important as model reduction methods 
are applied to more complex problems. Equation (5) becomes 

+ SRs{x^) + ^n+l (g) 

for n = 1,2,..., where the ^*^+1 are independent Gaussian random variables with mean zero and 
variance cr^, and is the sum: 

P VS q 

4”=M+E +X E kiPiix "-’)+E (7) 

j=l j=l i=l j=l 

/r, and {aj,bij,Cj} are parameters to be inferred from data, and the exogenous inputs Pi, i = 
1,..., s are functions to be determined; to simplify the notations, equation (7) has been written as 
if equation (6) were scalar. This is the NARMAX representation of x and z. In equation (7), the 
terms in z are the autoregression part of order p, the terms in ^ are the moving average part of 
order q, and the Pi{x) are the exogenous inputs. Note that in the reduced system (6), the z” term 
can be eliminated; indeed, if we substitute z” = {x"^ — x'^~^)/5 — Rs{x^~^) into (7) and the second 
of equation in (6), we obtain a NARMA representation for x. A suitable choice of the functions 
Pi will be discussed in the context of a specific example and will connect the representation to the 
approximation of equation (3). 

To implement the NARMAX representation, one has to determine its structure and estimate 
the parameters. While NARMAX has been widely studied (see e.g. [18, 22-25] and references 
therein), one should use the earlier work with caution, especially in the detection of structure 
by least-squares-based methods, because in the standard NARMAX, unlike here, the exogenous 
process is independent of the noise process. Suppose one has selected a structure, that is, chosen 
the functions Pi and the orders {p,r,s,q) in the representation (7). Since the representation is 
linear in the parameters 9 = {p,aj,bij,Cj), these parameters can be estimated using conditional 
likelihoods as follows. The sequence {z^} for n = 1,2,..., N can be computed from the data 
using the first equation in (6). Once the values of ..., are known, the noise sequence for 
q + 1 < n < N can be computed from = z” — This leads to the conditional log-likelihood 
of the observations x” for q + 1 < n < N (up to a constant); 


i{9\e,...,n 


N 


E 

n=q+l 


_ (|,n|2 


N-q 
2 


Ina^ 


Since the NARMAX representation is linear in its parameters other than a'^, the log-likelihood 
l{9\^^,... ,^'?) is quadratic in these parameters, its gradient can be easily computed and the maxi¬ 
mum likelihood estimator (MLE) can be obtained by standard gradient-based optimization meth¬ 
ods, such as the quasi-Newton method. If the reduced system is ergodic, the MLE is asymptotically 
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consistent, and the initial values of do not affect the result (see e.g. [21,24]). For conve¬ 
nience, we set = • • • = = 0. 

We remark that for the above Gaussian likelihood, the MLE is the same as the least squares 
estimator for the parameters, which has been widely used in control (see e.g. [22,23]). As shown 
in [22], the above estimation procedure can be made recursive; also, the noise sequence need not 
be Gaussian. 

The detection of the representation’s structure, however, is less straightforward, as is well- 
known, see [18,20]. Because in our problem the exogenous processes are not independent of the 
noise, popular techniques such as orthogonal least squares and error reduction ratios (see e.g. [18] 
and references therein), do not work. We use the following criteria for selecting the structure of 
the representation: (1) it should fit the long-term statistics of the resolved variables, such as the 
mean and autocorrelation function; (2) as the size of the data increase, the estimated parameters 
should converge; (3) the estimated parameters should reflect features of the resolved variables, such 
as symmetry properties. We find structures that satisfy these criteria by trial and error. 

It is of interest to relate the NARMAX representation to the Mori-Zwanzig (MZ) formalism 
[4,5,26,27]. The overall setting in the MZ representation is the same as here: one has data a for 
the X variables, and one samples data /3 for y from a given initial pdf. The MZ formalism creates 
the approximation Ro{x) in equation (3) by conditional expectation: 

Roix) = E[R{x,y)\x], 

where E[a|5] is the expected value of a with respect to the initial measure given b. When the 
system is ergodic and the initial pdf for /? is the invariant measure conditioned by a, this is the 
best least-squares approximation of R{x, y) by a function of x. The MZ formalism then yields an 
expression for z{x, y) in equation (2) as a sum of a noise and a non-Markovian memory/dissipation 
term, corresponding to ^*^+1 and in equation (7); note that in (7) Rq is not restricted to the 

MZ recipe. The MZ expressions are exact, and prove the need for the representation of z to take 
the memory into account by including information from earlier steps. 

Once the initial data y{0) = /3 have been sampled, the MZ equations are deterministic; the 
MZ formalism proposes to follow in time one particular sample path of the system. For a chaotic 
system such as the one discussed in the next section, this may not be computationally feasible. 
The representation here looks for sample paths of a stationary stochastic process whose statistics 
agree with the statistics dehned by the equations of motion. This is a related but less ambitious 
and more feasible task. 

The evaluation of the various terms in the MZ formalism is difficult; as far as we know, there is 
only one case in the literature [28] where it has been successfully carried out in full for a nonlinear 
problem that is not completely trivial. The MZ formalism is a useful starting point for analytic 
approximations (see e.g. [28,29]) but it is difficult to use it to construct reduced models from data 
when suitable analytic approximations are not available. The formalism here is a more tractable 
way to use data for dimensional reduction, and generalizes the MZ formalism to a broader class of 
approximations. 

There is a large literature on data-based dimensional reduction. In [12], see also [30], the noise 
z is represented as the sum of an approximating polynomial in x obtained by regression and a 
one-step autoregression; the details are in the next section where we compare our results to those 
in [12]. The shortcomings of this representation as a general tool are that it does not allow z to 
remember past values of x, and that the autoregression term is not necessarily small, making it 
difficult to solve the equations accurately. Furthermore, in [12], numerical values of a continuous 
z are obtained by hnite differences. In [13, 31] the noise is represented by a conditional Markov 
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chain that depends on both current and past values of x; the Markov chain is deduced from data 
by binning and counting, assuming that exact observations of 2 ; are available. It should be noted 
that the Markov chain representation is intrinsically discrete, making this work close to ours in 
spirit. In [32] the noise is treated as continuous and represented in a form that is partly analogous 
to the NARMAX representation once one translates from the continuum to the grid. An earlier 
construction of a reduced approximation can be found in [10], where the approach was not yet fully 
discrete. Other interesting related work can be found in [33-36]. 


3 The Lorenz 96 equations 


The Lorenz 96 equations [37] are a set of chaotic differential equations that is often used as a 
metaphor for the atmosphere. It has been widely used as a test bench for various dimensional 
reduction and stochastic parametrization methods [12,13,30,31,38]. Following [13], we use the 
following formulation of the equations: 


~j^yj,k = ~[yj+i,k{yj-i,k ■ 


- 2 ) — Xk + F + Zk, 

~ yj+2,k) yj,k T hyXy^ 


with Zk = ^ yj,k, and k = I,..., K, j = 1,..., J. The indices are cyclic, Xk = Xk+K, yj,k = 
yj,k+K and yj+j,k = yj,k+i- The system is invariant under spatial translations, and the statistical 
properties are identical for all Xk- The formulation here is equivalent to the original formulation by 
Lorenz (see e.g. [31,38]); the parameter e measures the time-scale separation between the resolved 
variables Xk and the unresolved variables yj^k- We set e = 0.5, so that there is no significant time 
scale separation between resolved and unresolved processes, as is both more realistic and more 
difficult to handle for parameterizations (see [38] and references therein). The other parameters 
are K = 18, J = 20, F = 10, hx = —1 and hy = 1. The ergodicity of the Lorenz 96 system has been 
numerically verified in earlier work (see e.g. [38]) and we do not revisit this issue. 

In the following section we present numerical results produced by our NARMAX scheme and 
compare them to those in [12] labeled POLYAR. We do not compare with the results of [13] because 
they require exact observations of 2 ;. In [12], the continuous 2 ; is estimated by finite differences: 


... _ Xk{t + 6) -Xk{t) 

Zk{t) ^ - - - Xk-l (Xfc+i - Xk-2) +Xk- F. 


Then a polynomial regression of the form 2 ;fc(f) = P{xk{t))+rik{'t) is used to fit the data {xfc(nJ), 2;fc(n<5)}, 
where P{x) is an approximating polynomial obtained by least squares, and r]k{t) is a one-step au¬ 
toregression (AR(1)) with parameters estimated from the time series Zk{nS) — P{xk{n6)), for 
n = 1, 2,. This leads to the following reduced stochastic equation: 


—Xk = Xk-i (xfc+i - Xk-2) -Xk + F + P{xk) + rjk, 
where rjk is an autoregression of the form 


( 8 ) 


r]k{t + 6) = (l)r]k{t) + (T^k{t) (9) 

where (f), a are constants deduced from the data, and the ^kit) are independent identically dis¬ 
tributed Gaussian random variable with mean zero and variance one, for each component k = 
1,... ,K of the equation. This reduced system is solved as follows: given the current time vectors 
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Table 1: The estimated parameters in the POLYAR system; the column labeled j for j = 0,..., 5 
contains the coefficient of x to the power j. 




5 

4 

3 

2 

1 

0 

0 

a 

6 

= 0.01 

-0.00002 

0.0004 

-0.0002 

-0.0258 

-0.3567 

0.0529 

0.9948 

0.9397 

5 

= 0.05 

-0.00003 

0.0009 

-0.0035 

-0.0137 

-1.0030 

1.3919 

0.7626 

11.3857 


Table 2: The estimated parameters in the NARMAX model. 




ai 

^1.1 

bi,2 

di 



a" 

6 = 

0.01 

0.9782 

ai 

-0.1271 

^1,1 

0.1132 

^ 1,2 

0.9997 

^ 1,3 

Cl,l 

0.0115 

0.0004 

6 = 

0.05 

0.8879 

-0.0712 

-0.0002 

0.0002 

-0.0084 

0.0556 

0.0284 


{rjk{t) ,Xk{t)), the next time-step r]k{t + 6) is calcnlated from (9), and then Xk{t + 6) is computed 
by integrating (8) by a fourth-order Runge-Kutta methods, with r]{t) kept constant during each 
time step. 

In the NARMAX scheme, we use the representation (7), choosing one of the functions Pi{x) to 
be Rs{x) from the approximation (3) and the others to be powers of x. This connects the numerical 
scheme with the representation of the noise. We select the structure and estimate the parameters 
as described earlier. The parameters are the same for each spatial component, refecting the spatial 
symmetry of the equation. Each component of 2 ; remembers only its own past and the past of the 
corresponing component of x. The term in equation (7)becomes: 

j=i j=i 1=1 

+ E E CiAMx"-’))' + E (10) 

j=l 1=1 j=l 

The determination of the numerical parameters in this representation is part of the calculation and 
the values used are listed in the next section. 

4 Numerical results 

In the numerical runs, we generate data for Xk by integrating the full Lorenz 96 system with 
parameters (e, K, J, F, hx, hy) = (0.5,18, 20,10, —1,1), using a fourth order Runge-Kutta method 
with time step 0.001. We consider two cases: one in which the observations are made at intervals 
of 5 = 0.01 and one at which they are made at intervals 5 = 0.05; the first case corresponds to a 
case discussed in ( [13,31]); in the second case, the data are slightly sparser. In each case, we make 
observations at at N = 5 x 10^ points; this requires that the full system be integrated for 5000 and 
25000 time units, respectively. 

For the POLYAR equations of [12], a fifth-order polynomial is used for both observation settings. 
Increasing the order further produces small coefficients for the higher degree terms, which do not 
reduce the variance of noise. The estimated parameters, i.e. the coefficients of the polynomial and 
the parameters for the autoregression, for the first component xi of x are shown in Table 1. 

For the NARMAX equation (10), we estimated {p,r,s,q) = (1,2,0,1) and [dx^dn) = (1,0) for 
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Figure 1: Probability density functions (pdf), autocorrelation functions (ACF) and cross-correlation 
functions (CCF) of xi, produced by the full Lorenz 96 model, the POLYAR system and the NAR- 
MAX system. The left column is the case <5 = 0.01, and the right column is the case <5 = 0.05. 
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Table 3: The mean, standard deviation and the Kolmogorov-Smirnov statistic (D). 




5 = 0.01 



5 = 0.05 


mean 

std. 

D 

mean 

std. 

D 

full Lorenz 96 

2.4506 

3.5230 


2.3978 

3.5222 


POLYAR 

2.5335 

3.3807 

0.0183 

2.6031 

2.8564 

0.0747 

NARMAX 

2.4113 

3.5270 

0.0055 

2.4293 

3.5402 

0.0049 


the case 6 = 0.01, and for the case 5 = 0.05, {p,r,s,q) = (1,1,1,0) and {dx,dR) = (3,1). The 
estimated parameters for xi are shown in Table 2. 

First, we compare the statistics of the solutions generated by the two reduced systems with the 
statistics of the full system. We integrate the reduced equations in both cases and obtain values at 
5 X 10^ points. We calculate the following quantities from the reduced equations as well as from 
the full Lorenz 96 equations: the mean, the standard deviation, the probability density function 
(pdf), the Kolmogorov-Smirnov statistic, the autocorrelation function (ACF) of xi, and the cross¬ 
correlation function (CCF) between xi and X 2 - The pdf of xi for the full Lorenz 96 system is well 
reproduced by both reduced systems when 5 = 0.01, see left top in Figure 1. In the sparser data 
case 5 = 0.05 the NARMAX equations produce a much better pdf than the POLYAR equations, 
see right top in Figure 1. Table 3 displays the mean, the standard deviation, and the Kolmogorov- 
Smirnov statistic that compare the cumulative distributions of the full Lorenz 96 system and with 
that of the reduced equations. The NARMAX system is more accurate than the POLYAR system in 
both cases. The autocorrelation function and the cross correlation function are well reproduced by 
both reduced systems when 5 = 0.01, see left middle and left bottom of Figure 1. When 5 = 0.05, 
however, the autocorrelations and the cross correlations reproduced by the POLYAR miss the 
amplitude of oscillation and exhibit a phase shift from those of the full Lorenz 96 equations, while 
the NARMAX system remains accurate, see right middle and right bottom of Figure 1. 

We now investigate how well a reduced system predicts the behavior of the full system by 
calculating mean trajectories of the reduced systems and comparing them with a true trajectory 
of the full Lorenz 96 system, as follows. First we integrate the full Lorenz 96 system for 10 x 
Nq time units, and store the results as Nq short trajectories of 10 time units each. For each 
short true trajectory, we perform N^ns integrations of the reduced systems over 10 time units, 
starting all ensemble members from the same several-step initial conditions as the corresponding 
full solution - several initial steps are needed to initialize 77 in POLYAR and ^ in NARMAX. 
We do not introduce artificial perturbations into the initial conditions, because the exact initial 
conditions for x are known, and by initializing rj and ^ from data, we preserve the memory of 
the system so as to generate better ensemble trajectories. We then average the solntions of the 
rednced equations in each snbinterval and compare these averages with the trajectories of the 
full system by calculating the root-mean-square error (RMSE) and anomaly correlation (ANCR) 
between them; the former measures the average difference between trajectories while the latter 
measures the average correlation between them; the formulas and their rationale can be found e.g. 
in [13]. 

Results for RMSE and ANCR are shown in Eigure 2, where we use Nq = 10000, Wns = 1, 5, 20 
and the nnmber of steps where initial conditions are imposed is uq = max{l,p, r, s, 2q\ -|- 1. In the 
case 5 = 0.01, the NARMAX reduction performs slightly better than the POLYAR reduction. In 
the case 5 = 0.05, the NARMAX rednction provides a significant improvement over the POLYAR 
rednction. Por example, the forecast lead time at which the anomaly correlation drops below 0.6 
is extended from r = 2 to r = 4 in the case Wns = 20. 












Figure 2: Root mean square error (RMSE) and anomaly correlation (ANCR) of ensemble fore¬ 
casting, produced by the NARMAX system (solid line) and the POLYAR system (dashed line), 
for different ensemble sizes: Ng,ns = 1 (circle marker), Yens = 5 (cross marker), and Yens = 20 
(asterisk marker). The left column is the case <5 = 0.01, and the right column is the case 5 = 0.05. 
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5 Conclusions 


We have presented a discrete approach to data-based dimensional reduction and stochastic parametriza- 
tion, in which the problem is consistently treated as discrete, obviating earlier difficulties in esti¬ 
mating noise from measurements and in approximating reduced continuum equations. Within this 
discrete approach, we have identified the reduced system via the NARMAX representation. This 
generalizes earlier work, in particular by making it easy to include memory effects in full. We have 
tested the resulting algorithm on the Lorenz 96 system of equations, often used as a test bench for 
dimensional reduction schemes; our construction did better than earlier work in reproducing the 
dynamics and the long-range statistics of the variables of interest, most conspicuously in a problem 
where the data were sparse. We related our representation to the Mori-Zwanzig formalism and 
suggested that our methods can be used to construct data-based implementations of this formal¬ 
ism. We expect the advantages of our modeling to become even more marked as it is applied to 
increasingly complex problems. 
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